We assume that you have already installed Anaconda.
If you don't have an environment, create one by following command:
conda create --name YOURNAME scipy jupyter ipython
(Replace YOURNAME
by whatever you want to call the new environment.)
Then, activate the new environment
source activate YOURNAME
Librosa can then be installed by the following [🔗]:
conda install -c conda-forge librosa
NOTE: Windows need to install audio decoding libraries separately. We recommend ffmpeg.
Start Jupyter:
jupyter notebook
and open a new notebook.
Then, run the following:
In [ ]:
import librosa
print(librosa.__version__)
In [ ]:
y, sr = librosa.load(librosa.util.example_audio_file())
print(len(y), sr)
Librosa has extensive documentation with examples.
When in doubt, go to http://librosa.github.io/librosa/
To load a signal at its native sampling rate, use sr=None
In [ ]:
y_orig, sr_orig = librosa.load(librosa.util.example_audio_file(),
sr=None)
print(len(y_orig), sr_orig)
Resampling is easy
In [ ]:
sr = 22050
y = librosa.resample(y_orig, sr_orig, sr)
print(len(y), sr)
But what's that in seconds?
In [ ]:
print(librosa.samples_to_time(len(y), sr))
In [ ]:
D = librosa.stft(y)
print(D.shape, D.dtype)
Often, we only care about the magnitude.
D
contains both magnitude S
and phase $\phi$.
In [ ]:
import numpy as np
In [ ]:
S, phase = librosa.magphase(D)
print(S.dtype, phase.dtype, np.allclose(D, S * phase))
In [ ]:
C = librosa.cqt(y, sr=sr)
print(C.shape, C.dtype)
In [ ]:
# Exercise 0 solution
y2, sr2 = librosa.load( )
D = librosa.stft(y2, hop_length= )
Most features work either with audio or STFT input
In [ ]:
melspec = librosa.feature.melspectrogram(y=y, sr=sr)
# Melspec assumes power, not energy as input
melspec_stft = librosa.feature.melspectrogram(S=S**2, sr=sr)
print(np.allclose(melspec, melspec_stft))
In [ ]:
# Displays are built with matplotlib
import matplotlib.pyplot as plt
# Let's make plots pretty
import matplotlib.style as ms
ms.use('seaborn-muted')
# Render figures interactively in the notebook
%matplotlib nbagg
# IPython gives us an audio widget for playback
from IPython.display import Audio
In [ ]:
plt.figure()
librosa.display.waveplot(y=y, sr=sr)
In [ ]:
plt.figure()
librosa.display.specshow(melspec, y_axis='mel', x_axis='time')
plt.colorbar()
In [ ]:
# Exercise 1 solution
X = librosa.feature.XX()
plt.figure()
librosa.display.specshow( )
The beat tracker returns the estimated tempo and beat positions (measured in frames)
In [ ]:
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
print(tempo)
print(beats)
Let's sonify it!
In [ ]:
clicks = librosa.clicks(frames=beats, sr=sr, length=len(y))
Audio(data=y + clicks, rate=sr)
Beats can be used to downsample features
In [ ]:
chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
chroma_sync = librosa.feature.sync(chroma, beats)
In [ ]:
plt.figure(figsize=(6, 3))
plt.subplot(2, 1, 1)
librosa.display.specshow(chroma, y_axis='chroma')
plt.ylabel('Full resolution')
plt.subplot(2, 1, 2)
librosa.display.specshow(chroma_sync, y_axis='chroma')
plt.ylabel('Beat sync')
Recurrence matrices encode self-similarity
R[i, j] = similarity between frames (i, j)
Librosa computes recurrence between k
-nearest neighbors.
In [ ]:
R = librosa.segment.recurrence_matrix(chroma_sync)
In [ ]:
plt.figure(figsize=(4, 4))
librosa.display.specshow(R)
We can include affinity weights for each link as well.
In [ ]:
R2 = librosa.segment.recurrence_matrix(chroma_sync,
mode='affinity',
sym=True)
In [ ]:
plt.figure(figsize=(5, 4))
librosa.display.specshow(R2)
plt.colorbar()
In [ ]:
# Exercise 2 solution
Separating harmonics from percussives is easy
In [ ]:
D_harm, D_perc = librosa.decompose.hpss(D)
y_harm = librosa.istft(D_harm)
y_perc = librosa.istft(D_perc)
In [ ]:
Audio(data=y_harm, rate=sr)
In [ ]:
Audio(data=y_perc, rate=sr)
NMF is pretty easy also!
In [ ]:
# Fit the model
W, H = librosa.decompose.decompose(S, n_components=16, sort=True)
In [ ]:
plt.figure(figsize=(6, 3))
plt.subplot(1, 2, 1), plt.title('W')
librosa.display.specshow(librosa.logamplitude(W**2), y_axis='log')
plt.subplot(1, 2, 2), plt.title('H')
librosa.display.specshow(H, x_axis='time')
In [ ]:
# Reconstruct the signal using only the first component
S_rec = W[:, :1].dot(H[:1, :])
y_rec = librosa.istft(S_rec * phase)
In [ ]:
Audio(data=y_rec, rate=sr)
This was just a brief intro, but there's lots more!
Read the docs: http://librosa.github.io/librosa/